The D-TUNA Corpus: A Dutch Dataset for the Evaluation of Referring Expression Generation Algorithms

نویسندگان

Ruud Koolen

Emiel Krahmer

چکیده

In this paper, we present the D-TUNA corpus, which is the first semantically annotated corpus of referring expressions in Dutch. Its primary function is to evaluate and improve the performance of REG algorithms. Such algorithms are computational models that automatically generate referring expressions by computing how a specific target can be identified to an addressee by distinguishing it from a set of distractor objects. We performed a large-scale production experiment, in which participants were asked to describe furniture items and people, and provided all descriptions with semantic information regarding the target and the distractor objects. Besides being useful for evaluating REG algorithms, the corpus addresses several other research goals. Firstly, the corpus contains both written and spoken referring expressions uttered in the direction of an addressee, which enables systematic analyses of how modality (text or speech) influences the human production of referring expressions. Secondly, due to its comparability with the English TUNA corpus, our Dutch corpus can be used to explore the differences between Dutch and English speakers regarding the production of referring expressions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generation of Dutch referring expressions using the D-TUNA corpus

This paper describes our research into generating Dutch noun phrases as descriptions of furniture objects or people. This is usually done in two steps: attribute selection and realisation. This research focuses only on the realisation step: generating a noun phrase from given attributes. The research is done on the Dutch version of the TUNA-corpus, which contains annotated human-produced descri...

متن کامل

G-TUNA: a corpus of referring expressions in German, including duration information

Corpora of referring expressions elicited from human participants in a controlled environment are an important resource for research on automatic referring expression generation. We here present G-TUNA, a new corpus of referring expressions for German. Using images of furniture as stimuli similarly to the TUNA and D-TUNA corpora, our corpus extends on these corpora by providing data collected i...

متن کامل

The TUNA Challenge 2008: Overview and Evaluation Results

The TUNA Challenge was a set of three shared tasks at REG’08, all of which used data from the TUNA Corpus. The three tasks covered attribute selection for referring expressions (TUNA-AS), realisation (TUNA-R) and end-toend referring expression generation (TUNAREG). 8 teams submitted a total of 33 systems to the three tasks, with an additional submission to the Open Track. The evaluation used a ...

متن کامل

Introducing Shared Task Evaluation to NLG The TUNA Shared Task Evaluation Challenges

Shared Task Evaluation Challenges (stecs) have only recently begun in the field of nlg. The tuna stecs, which focused on Referring Expression Generation (reg), have been part of this development since its inception. This chapter looks back on the experience of organising the three tuna Challenges, which came to an end in 2009. While we discuss the role of the stecs in yielding a substantial bod...

متن کامل

XML Format Guidelines for the TUNA Corpus

This document forms part of the 2008 distribution of the TUNA Corpus, Version 1.0. This is the first public release of the complete TUNA Corpus of Referring Expressions. A subset of the corpus was used in the first Shared Task and Evaluation Challenge for NLG, the Attribute Selection for the Generation of Referring Expressions Challenge (ASGRE), co-located with the Workshop on Using Corpora in ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

The D-TUNA Corpus: A Dutch Dataset for the Evaluation of Referring Expression Generation Algorithms

نویسندگان

چکیده

منابع مشابه

Generation of Dutch referring expressions using the D-TUNA corpus

G-TUNA: a corpus of referring expressions in German, including duration information

The TUNA Challenge 2008: Overview and Evaluation Results

Introducing Shared Task Evaluation to NLG The TUNA Shared Task Evaluation Challenges

XML Format Guidelines for the TUNA Corpus

عنوان ژورنال:

اشتراک گذاری